A bootstrapping algorithm to improve cohort identification using structured data

نویسندگان

  • Sasikiran Kandula
  • Qing Zeng-Treitler
  • Lingji Chen
  • William L. Salomon
  • Bruce E. Bray
چکیده

Cohort identification is an important step in conducting clinical research studies. Use of ICD-9 codes to identify disease cohorts is a common approach that can yield satisfactory results in certain conditions; however, for many use-cases more accurate methods are required. In this study, we propose a bootstrapping method that supplements ICD-9 codes with lab results, medications, etc. to build classification models that can be used to identify cohorts more accurately. The proposed method does not require prior information about the true class of the patients. We used the method to identify Diabetes Mellitus (DM) and Hyperlipidemia (HL) patient cohorts from a database of 800 thousand patients. Evaluation results show that the method identified 11,000 patients who did not have DM related ICD-9 codes as positive for DM and 52,000 patients without HL codes as positive for HL. A review of 400 patient charts (200 patients for each condition) by two clinicians shows that in both the conditions studied, the labeling assigned by the proposed approach is more consistent with that of the clinicians compared to labeling through ICD-9 codes. The method is reasonably automated and, we believe, holds potential for inexpensive, more accurate cohort identification.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Robust Bootstrap Algorithm for the Assessment of Common Set of Weights in Performance Analysis

The performance of the units is defined as the ratio of the weighted sum of outputs to the weighted sum of inputs. These weights can be determined by data envelopment analysis (DEA) models. The inputs and outputs of the related (Decision Making Unit) DMU are assessed by a set of the weights obtained via DEA for each DMU. In addition, the weights are not generally common, but rather, they are ve...

متن کامل

A Hybrid Data Clustering Algorithm Using Modified Krill Herd Algorithm and K-MEANS

Data clustering is the process of partitioning a set of data objects into meaning clusters or groups. Due to the vast usage of clustering algorithms in many fields, a lot of research is still going on to find the best and efficient clustering algorithm. K-means is simple and easy to implement, but it suffers from initialization of cluster center and hence trapped in local optimum. In this paper...

متن کامل

Bootstrapping structure using similarity

In this paper a new similarity-based learning algorithm, inspired by string edit-distance (Wagner and Fischer, 1974), is applied to the problem of bootstrapping structure from scratch. The algorithm takes a corpus of unannotated sentences as input and returns a corpus of bracketed sentences. The method works on pairs of unstructured sentences or sentences partially bracketed by the algorithm th...

متن کامل

Fault Identification using end-to-end data by imperialist competitive algorithm

Faults in computer networks may result in millions of dollars in cost. Faults in a network need to be localized and repaired to keep the health of the network. Fault management systems are used to keep today’s complex networks running without significant cost, either by using active techniques or passive techniques. In this paper, we propose a novel approach based on imperialist competitive alg...

متن کامل

Parameters Identification of an Experimental Vision-based Target Tracker Robot Using Genetic Algorithm

In this paper, the uncertain dynamic parameters of an experimental target tracker robot are identified through the application of genetic algorithm. The considered serial robot is a two-degree-of-freedom dynamic system with two revolute joints in which damping coefficients and inertia terms are uncertain. First, dynamic equations governing the robot system are extracted and then, simulated nume...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of biomedical informatics

دوره 44 Suppl 1  شماره 

صفحات  -

تاریخ انتشار 2011